⚡ Tokenizer Optimization - abnv · Scour

Accelerate CPU-based AI inference workloads using Intel AMX on Amazon EC2 🗺️Region Inference

aws.amazon.com·3d·

Beating Python’s GIL: Achieving a 130x Speedup in Batch Processing with Rust and Rayon 🦀MIR Optimization

medium.com·2d·

Building CompilerSutra 🎓Teaching Compilers

docs.google.com·20h·DEV·

OmniVoice, high-quality TTS for 600+ Languages 🔄Incremental Lexing

zhu-han.github.io·11h·Hacker News·

Metal Quantized Attention: pulling M5 Max ahead with Int8 matrix multiplication 🗺️Region Inference

releases.drawthings.ai·1d·Hacker News·

Speculative Decoding: Performance or Illusion? 🗺️Region Inference

specdecode-bench.github.io·6d·Hacker News·

How we chose Positron’s Python type checker ✅Type Checking

positron.posit.co·2d·Hacker News·

General scales unlock AI evaluation with explanatory and predictive power 🪜Recursive Descent

nature.com·1d·

Context Rot: How Increasing Input Tokens Impacts LLM Performance 🔍Tokenizers

trychroma.com·6d·DEV·

Donald Raab: Measuring the Startup Memory Cost for Lazy Iteration Patterns in Java 🗑️Garbage Collection

donraab.medium.com·2d·

APL Performance 🔀SIMD Programming

aplwiki.com·3d·Hacker News·

Intel Delivers Open, Scalable AI Performance in MLPerf Inference v6.0 🗺️Region Inference

newsroom.intel.com·1d·

Supercharging Redpanda Streaming with profile-guided optimization 📈Performance Tools

redpanda.com·1d·

yash27-lab/batch_forge: A high-performance, bare-metal inference engine for JAX and Equinox models written in Rust. Features zero-copy Safetensors loading and hand-optimized Metal/Vulkan compute kernels for Transformers, Vision Language Models, and State-Space Models 🗺️Region Inference

github.com·3d·Hacker News·

Iteratively optimizing an SPSC queue 🎯Ring Buffers

blog.c21-mac.com·4d·r/cpp·

MXFP8 GEMM: Up to 99% of cuBLAS Performance Using CUDA and PTX 🔬Nanopasses

danielvegamyhre.github.io·5d·Hacker News·

Scaling AI Workloads in Java Without Breaking Your APIs ⚡Interpreter Optimization

dzone.com·6d·

Discord Engineers Add Distributed Tracing to Elixir's Actor Model Without Performance Penalty ✨Gleam

infoq.com·5d·

Systematic Analysis of CPU-Induced Slowdowns in Multi-GPU LLM Inference (Georgia Tech) 🗺️Region Inference

semiengineering.com·6d·

Designing High-Concurrency Databricks Workloads Without Performance Degradation 🗑️Concurrent GC

dzone.com·6d·

Loading more...